feat(patch_set): PatchSet::parse_bytes for raw byte input#64
Open
weihanglo wants to merge 26 commits intobmwill:masterfrom
Open
feat(patch_set): PatchSet::parse_bytes for raw byte input#64weihanglo wants to merge 26 commits intobmwill:masterfrom
PatchSet::parse_bytes for raw byte input#64weihanglo wants to merge 26 commits intobmwill:masterfrom
Conversation
`.lines()` strips line endings, so callers tracking byte offsets need to re-add the `\r\n` or `\n` length manually. Extract the repeated inline pattern into a reusable helper.
* Parse `diff --git` extended headers * split multi-file git diffs at `diff --git` boundaries
Compat test for also `git apply`.
Unlike unidiff, gitdiff produces patches for empty file creations/deletions (`0\t0` in numstat) because they carry `diff --git` + extended headers even without hunks. Binary files (`-\t-\t`) are skipped in gitdiff mode for now.
* Added types representing both literal and delta Git binary patches * Added a parser for the `GIT binary patch` format. This doesn't include the patch application (which will be added in later commits) The implementation is based on * Specification from <https://diffx.org/spec/binary-diffs.html> * Behavior observation of Git CLI
- Add `Binary::Keep` variant (now the default) to `ParseOptions` - Add `PatchKind::Binary` variant for binary patches - Parse `GIT binary patch` payload via `parse_binary_patch` - Handle `Binary files ... differ` as `BinaryPatch::Marker` - Add `extract_file_op_binary` for file ops without ---/+++ headers
The API was stabilized in 1.73. The lint was added in 1.93. This is required for a MSRV bump to 1.75
This is a preparation for binary diff application support. * Git binary patch is compressed by zlib hence flate2 * zlib-rs (which is the most performant zlib backend) requires MSRF 1.75.0+ hence the bump.
* Add base85 encoder/decoder and Git delta format decoder. * Wire them into `BinaryPatch::apply() and `apply_reverse()` for decoding zlib-compressed, base85-encoded binary payload. These are feature-gated behind the `binary` feature.
Now both tests require `binary` Cargo feature.
Preparation for `PatchSet::parse_bytes(&[u8])` support. No behavior change.
Assumption: Header lines are always ASCII (with `core.quotePath=true`, git's default).
Move dispatch logic to free functions, so we can have private trait bound on them. * `next_gitdiff_patch` * `next_unidiff_patch` * `Iterator::next` -> `next_patch`
054f7b1 to
68431bf
Compare
Contributor
Author
|
Ooops the assumption is a bit too aggresive to cover non binary patch portion. |
Returns the longest valid UTF-8 prefix of the input. This will be used to safely convert binary patch data to `&str` without validating the entire remaining input.
We have the assumption that file path from hunk hader is UTF-8 (This is actually not true but we can leave it for future fix) Only the `str` constructor and Iterator impl are exposed for now. `parse_bytes` support comes in a follow-up commit.
Avoid lossy UTF-8 conversion of diff output so non-UTF8 content round-trips correctly. No more skips in GitDiff mode (except submodule of course)
68431bf to
05cdc37
Compare
Contributor
Author
Fixes with 33b4e38 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The entire idea is
PatchSet::parse_bytesso that some non-UTF8 hunk that Git doesn't consider as binary patch can still be safely parsed an applied.With this PR, our history replay test no longer skip any non-UTF8 patches.
When replaying rust-lang/rust history, it shows
And the 2980 patches were all submodule updates.
Fixes #63